Using Dialogue Corpora to Extend Information Extraction Patterns for Natural Language Understanding of Dialogue
نویسندگان
چکیده
This paper examines how Natural Language Process (NLP) resources and online dialogue corpora can be used to extend coverage of Information Extraction (IE) templates in a Spoken Dialogue system. IE templates are used as part of a Natural Language Understanding module for identifying meaning in a user utterance. The use of NLP tools in Dialogue systems is a difficult task given 1) spoken dialogue is often not well-formed and 2) there is a serious lack of dialogue data. In spite of that, we have devised a method for extending IE patterns using standard NLP tools and available dialogue corpora found on the web. In this paper, we explain our method which includes using a set of NLP modules developed using GATE (a General Architecture for Text Engineering), as well as a general purpose editing tool that we built to facilitate the IE rule creation process. Lastly, we present directions for future work in this area. 1. Information Extraction for Dialogue Why use Information Extraction for Dialogue? Information Extraction techniques for extracting meaning are generally applied to text documents, for example newspaper reports, scientific papers, or blogs, rather than to transcribed spoken dialogues. However, we have chosen to apply IE to dialogue for the following reason: Dialogue utterances tend not to be well-formed sentences, yet convey meaning to the hearer. Since utterances are not well-formed, a full-parsing method is not as desirable as a pattern matching approach with shallow syntactic parsing to identify NPs and VPs. This lends itself to an IE template-based approach. We devised our method when developing a demonstrator for a dialogue system in the domain of Office chat (for the EU-funded Companions project), but it could be applied to any dialogue domains. In our system, IE patterns are part of a Natural Language Understanding module (Figure 1). While all inter-module communication in the overall system takes place via a blackboard, this module in effect takes input from the upstream Speech Recognition and Dialogue Act tagging modules for the current user utterance and from the downstream Dialogue Manager for the system response to the previous utterance. It outputs a shallow meaning representation for the current user utterance which is passed on to the Dialogue Manager for formulating a response to the user. For example, one sort of input our system must handle in the domain of Office Chat is utterances that express the user’s emotional attitude about their day or project or task. Such utterances may be conceived of as ATTITUDE relations between a person and a day, project or task, with subtypes WORRY, HAPPY, ANNOY, etc. The WORRY Relation is signaled by words such as ‘worry’, ‘be worried’, ‘be troubled’, ‘be concerned’ and ‘be afraid’ and so on. Relations also have attributes which, in our domain, are attitude-type (WORRY, HAPPY, etc), attitude-subject (Person) and attitude-object (Person, Project or Task). Given this framework, the NLU output for the sentence “I'm a bit worried.” is: I’m a bit worried which may be expressed more abstractly in a logical form representation as: object(user:person), object (e1:Person), attribute (e1, user, true), object(r1:attitude), attribute(r1,subtype,worry) attribute(r1,attitude-subject, e1) attribute(r1,attitude-object,unknown) Such representations, in which both entities and relations are reified, are convenient given partial information, which may result either from imperfect analysis or from information being distributed across multiple sentences. To derive such meaning representations automatically we use entity and relation extraction techniques. To develop entity and relation extractors one can pursue either a
منابع مشابه
Information Extraction Tools and Methods for Understanding Dialogue in a Companion
This paper discusses how Information Extraction is used to understand and manage Dialogue in the EU-funded Companions project. This will be discussed with respect to the Senior Companion, one of two applications under development in the EU-funded Companions project. Over the last few years, research in human-computer dialogue systems has increased and much attention has focused on applying lear...
متن کاملUnderstanding Student Language: An Unsupervised Dialogue Act Classification Approach
Within the landscape of educational data, textual natural language is an increasingly vast source of learning-centered interactions. In natural language dialogue, student contributions hold important information about knowledge and goals. Automatically modeling the dialogue act of these student utterances is crucial for scaling natural language understanding of educational dialogues. Automatic ...
متن کاملSpoken to Spoken vs. Spoken to Written: Corpus Approach to Exploring Interpreting and Subtitling
issue of Polibits includes a selection of papers related to the topic of processing of semantic information. Processing of semantic information involves usage of methods and technologies that help machines to understand the meaning of information. These methods automatically perform analysis, extraction, generation, interpretation, and annotation of information contained on the Web, corpus, nat...
متن کاملUnsupervised Classification of Student Dialogue Acts with Query-Likelihood Clustering
Dialogue acts model the intent underlying dialogue moves. In natural language tutorial dialogue, student dialogue moves hold important information about knowledge and goals, and are therefore an integral part of providing adaptive tutoring. Automatically classifying these dialogue acts is a challenging task, traditionally addressed with supervised classification techniques requiring substantial...
متن کاملData-Driven Language Understanding for Spoken Language Dialogue∗
We present a natural-language customer service application for a telephone banking call center, developed as part of the AMITIES dialogue project (Automated Multilingual Interaction with Information and Services). Our dialogue system, based on empirical data gathered from real call-center conversations, features data-driven techniques that allow for spoken language understanding despite speech ...
متن کامل